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TO ALL WHOM IT MAY CONCERN: 

BE IT KNOWN THAT WE, Hideo Miyake, a citizen of 
Japan residing at Kawasaki, Japan, Atsuhiro Suga, a citizen 
of Japan residing at Kawasaki, Japan, Yasuki Nakamura, a 
citizen of Japan residing at Kawasaki, Japan and Yoshimasa 
Takebe, a citizen of Japan residing at Kawasaki, Japan have 
invented certain new and useful improvements in 
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of which the following is a specification 
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^ITLE OF THE INVENTION 
^ PARALL^ PROCESSOR 

BACKGROUND OF THE INVENTION 
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1 . 



Field of the Invention 



The present invention generally relates to 



processors, and, more particularly, to a parallel 
processor that executes a plurality of basic 
instructions in parallel. 
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2. Description of the Related Art 
Generally, in a conventional computer 



system, a plurality of basic instructions are 
executed in parallel by pipeline processing, thereby 
improving its performance. Conventionally, a 

15 plurality of basic instructions constitute a fixed- 
length instruction word, and a very-long instruction 
word (VLIW) technique is employed as a method for 
executing a plurality of basic instructions 
contained in one instruction word in parallel. Also, 

20 a super scalar technique may be employed. In 

accordance with the super scalar technique, basic 
instructions are executed in parallel depending on 
the number of basic instructions contained in each 
instruction word. 

25 FIG. 1 shows the structure of a 

conventional parallel processor 10. This parallel 
processor 10 comprises an instruction fetch unit 1 
connected to a memory 7, an instruction issue unit 3 
connected to the instruction fetch unit 1, 

30 instruction execution units EUO to EUn each 

connected to the instruction issue unit 3, and a 
register unit 5 connected to each of the instruction 
execution units EUO to EUn. 



35 instruction word from the memory 7, and supplies the 
instruction word to the instruction issue unit 3. 
The instruction issue unit 3 issues the basic 



The instruction fetch unit 1 fetches an 
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instructions contained in the supplied instruction 
word to the instruction execution units EUO to EUn . 
If the instruction execution units EUO to EUn are 
still executing previous basic instructions at this 
5 point, the instruction issue unit 3 waits for the 
end of the execution. When the execution ends, the 
instruction issue unit 3 supplies the basic 
instructions to the instruction execution units EUO 
to EUn . 

10 iStiB instruction execution units EUO to EUn 

execute th^ basic instructions, and notify the 
instructi^fn issue unit 3 of the end of the execution 
The regis^ter unit 5 supplies data to the instruction 
execution units EUO to EUn, if necessary, and holds 
15 the exec/ution results of the instruction execution 

units E?FO to EUn. The externally connected memory 7 
stores tsi instruction word string to be executed in 
the paf^allel processor 10. The memory 7 also stores 
necessary data for the execution units EUO to EUn to 
20 execute instructions, and data as the execution 
resujfts . 

IG. 2 shows the formats of instruction 
words to be\gupplied to a parallel processor having 
four instruction execution units EUO to EU3 . As 
shown in FIG. ZV each instruction word is made up of 
a basic instructd>Qn EI and a do-nothing instruction 
NOP. If the numbervof basic instructions contained 
in one instruction wd^rd to be executed in parallel 
is smaller than the nuflfiber of the instruction 
30 execution units EUO to E^ , the proportion of do- 
noting instructions is lar^ 

In the conventional parallel processing 
method of executing a plurality of basic 
instructions by the VLIW technique, each instruction 
35 word has a fixed length. Therefore, if the number 
of basic instructions to be executed in parallel is 
smaller than a predetermined number, do-nothing 
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Instructions are added to comply with the 
predetermined length. Because of that, in a program 
having a small number of basic instructions in total, 
the proportion of do-nothing instructions is large, 
5 and the amount of instruction code increases 

accordingly, resulting in problems such as poor 
usage efficiency of memory, a decrease of the hit 
ratio of cache memory, and an increase of the load 
on the instruction fetch mechanism. 
10 \?s^th the super scalar technique, there is 

also a problehL that a large-scale circuit is needed 
to increase the Tmmber of instructoins to be 
executed in parallfekj.. 



15 SUMMARY OF THE INVENTION 

A general object of the present invention 
is to provide parallel processors in which the above 
disadvantages are eliminated. 

A more specific object of the present 
20 invention is to provide a parallel processor that is 
capable of performing highly efficient parallel 
processing . 

The above objects of the present invention 
are achieved by a parallel processor that performs 
25 parallel processing of one or more basic 

instructions contained in each of instruction words 
delimited by instruction delimiting information, the 
parallel processor comprising: 

a plurality of instruction execution units 
30 that perform processes corresponding to the supplied 
basic instructions in parallel; 

an instruction fetch unit that fetches the 
instruction words one by one in accordance with the 
instruction delimiting information; and 
35 an instruction issue unit that selectively 

issues each of the basic instructions supplied from 
the instruction fetch unit to one of the instruction 
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execution units to execute the basic instruction. 

With the parallel processor having the 
above structure, the instruction fetch unit makes 
each instruction word length variable, so that the 
5 instruction words can be fetched one by one in 
accordance with the instruction delimiting 
information. Also, the instruction execution units 
can efficiently execute the instruction words, 
because each of the basic instructions is 
10 selectively issued to a corresponding one of the 
instruction execution units. 

The above and other objects and features 
^ of the present invention will become more apparent 

g=i from the following description taken in conjunction 

LH 15 with the accompanying drawings. 

ess 

in 

ry BRIEF DESCRIPTION OF THE DRAWINGS 

^--1 FIG. 1 shows the structure of a 

Q conventional parallel processor; 

^£ 20 FIG. 2 shows the formats of instruction 

= words to be supplied to a conventional parallel 

Q processor having four instruction execution units; 

O FIG. 3 shows the structure of a first 

example of a parallel processor in accordance with a 
25 first embodiment of the present invention; 

FIG. 4 shows the structures of an 
instruction fetch unit and an instruction issue unit 
of the parallel processor shown in FIG. 3; 

FIG. 5 shows the formats of instruction 
30 words to be supplied to the parallel processor of 
the first embodiment of the present invention; 

FIG. 6 shows the structure of a second 
example of the parallel processor in accordance with 
the first embodiment of the present invention; 
35 FIG. 7 shows the structure of a first 

example of a parallel processor in accordance with a 
second embodiment of the present invention; 
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FIG. 8 shows the structures of an 
instruction fetch unit and an instruction issue unit 
of the parallel processor shown in FIG. 7; 

FIG. 9 illustrates basic instruction 
5 rearrangement in the parallel processor of the 
second embodiment of the present invention; 

FIG. 10 is a circuit diagram of a 
conversion unit in the parallel processor shown in 
FIG. 7; 

10 FIG. 11 is a circuit diagram of the 

conversion unit in a case where the maximum basic 
instruction word length is 4; 

FIG. 12 shows the structure of a second 
example of the parallel processor in accordance with 
Ul 15 the second embodiment of the present invention; 

FIG. 13 shows the structures of an 
fy instruction fetch unit and an instruction issue unit 

of the parallel processor shown in FIG. 12; 

FIG. 14 shows the structure of a third 
C- 20 example of the parallel processor in accordance with 

^ the second embodiment of the present invention; 

Q FIG. 15 shows the structures of an 

O instruction fetch unit and an instruction issue unit 

of the parallel processor shown in FIG. 14; 
25 FIG. 16 shows the structure of a fourth 

example of the parallel processor in accordance with 
the second embodiment of the present invention; 

FIG. 17 shows the structures of an 
instruction fetch unit and an instruction issue unit 
30 of the parallel processor shown in FIG. 16; 

FIG. 18 shows the structure of a fifth 
example of the parallel processor in accordance with 
the second embodiment of the present invention; 

FIG. 19 shows the structures of an 
35 instruction fetch unit and an instruction issue unit 
of the parallel processor shown in FIG. 18; 

FIG, 20 shows the structure of a sixth 
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example of the parallel processor in accordance with 
the second embodiment of the present invention; 

FIG. 21 shows the structures of an 
instruction fetch unit and an instruction issue unit 
5 of the parallel processor shown in FIG. 20; 

FIG. 22 shows the structure of a first 
example of a parallel processor in accordance with a 
third embodiment of the present invention; 

FIG. 23 shows the structure of a second 
10 example of the parallel processor in accordance with 
the third embodiment of the present invention; 

FIG. 24 shows the structure of a third 
example of the parallel processor in accordance with 
the third embodiment of the present invention; 
15 FIG. 25 shows the structure of a fourth 

example of the parallel processor in accordance with 
the third embodiment of the present invention; 

FIG. 26 shows the structure of a fifth 
example of the parallel processor in accordance with 
20 the third embodiment of the present invention; 

FIG. 27 shows the structure of a sixth 
example of the parallel processor in accordance with 
the third embodiment of the present invention; 

FIG. 28 shows the structure of a first 
25 example of a parallel processor in accordance with a 
fourth embodiment of the present invention; 

FIG. 29 shows the structure of a second 
example of the parallel processor in accordance with 
the fourth embodiment of the present invention; 
30 FIG. 30 shows the structure of a third 

example of the parallel processor in accordance with 
the fourth embodiment of the present invention; 

FIG. 31 shows the structure of a fourth 
example of the parallel processor in accordance with 
35 the fourth embodiment of the present invention; 

FIG. 32 shows the structure of a fifth 
example of the parallel processor in accordance with 
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the fourth embodiment of the present invention; 

FIG. 33 shows the structure of a sixth 

example of the parallel processor in accordance with 
the fourth embodiment of the present invention; 

5 FIG. 34 shows the structure of a first 

example of a parallel processor in accordance with a 

fifth embodiment of the present invention; 

FIG, 35 shows the structure of a second 

example of the parallel processor in accordance with 

10 the fifth embodiment of the present invention; 

FIG. 36 shows the structure of a third 

example of the parallel processor in accordance with 

the fifth embodiment of the present invention; 

FIG. 37 shows the structure of a fourth 

Lq 15 example of the parallel processor in accordance with 

the fifth embodiment of the present invention; 

FIG. 38 shows the structure of a fifth 

example of the parallel processor in accordance with 

the fifth embodiment of the present invention; and 

20 FIG. 39 shows the structure of a sixth 

example of the parallel processor in accordance with 

the fifth embodiment of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
25 The following is a description of 

embodiments of the present invention, with reference 
to the accompanying drawings. 

[Embodiment 1] 

30 FIGS. 3 and 6 show parallel processors 20 

and 21 in accordance with a first embodiment of the 
present invention. The parallel processor 20 
comprises an instruction fetch unit 46 connected to 
a memory 12, an instruction issue unit 72 connected 
35 to the instruction fetch unit 46, two instruction 
execution units EUO and EUl having the same 
structure and connected to the instruction issue 
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unit 72, and a register unit 98 connected to each of 
the instruction execution units EUO and EUl . 
Likewise, the parallel processor 21 comprises an 
instruction fetch unit 47 connected to a memory 12, 
5 an instruction issue unit 73 connected to the 

instruction fetch unit 47, two instruction execution 
units EUO and EUl having the same structure and 
connected to the instruction issue unit 73, and a 
register unit 98 connected to each of the 
10 instruction execution units EUO and EUl. 

It should be noted that, in the following 
description, the maximum basic instruction length of 
O one instruction word is 2. However, the parallel 

^ processor in accordance with the first embodiment 

yi 15 should operate in the same manner in a case where 

the maximum basic instruction length in one 

yi 

fy instruction word is 3 or greater. 

( Example 1 ) 

FIG. 4 shows the structure of the 
20 instruction fetch unit 46 and the instruction issue 
S unit 72. The instruction fetch unit 46 comprises a 

Q fetch program counter (FPC) 300, adders 324 and 325, 

O an instruction buffer 308, a cutting unit 316, and 

an execution program counter (EPC) 339. 
25 The FPC 300 is connected to the memory 12 

and the instruction execution units EUO and EUl. 
The adder 324 is connected to the FPC 300. The 
instruction buffer 308 is connected to the memory 12, 
and the cutting unit 316 is connected to the 
30 instruction buffer 308. The adder 325 is connected 
to the cutting unit 316, and the EPC 339 is 
connected to the adder 325 and the register unit 98. 
The FPC 300 receives a fetch address contained in an 
instruction word from the memory 12, and the 
35 instruction buffer 308 receives fetch data contained 
in the instruction word from the memory 12. The FPC 
300 further receives a branch destination address 



-9- 



corresponding to a branch instruction from the 
instruction execution units EUO and EUl . 

On the other hand, the instruction issue 
unit 72 comprises an instruction register 347, 
selectors 355 and 356, a control unit 370, and an 
AND gate 378- Here, the instruction register 347 is 
connected to the cutting unit 316. The selectors 

355 and 356 are both connected to the instruction 
register 347. The selector 355 is connected to the 
instruction execution unit EUO, while the selector 

356 is connected to the instruction execution unit 
EUl, The control unit 370 is connected to the AND 
gate 378 and the selectors 355 and 356. The AND 
gate is connected to the instruction execution units 
EUO and EUl- In this structure, the instruction 
execution units EUO and EUl transmit execution 
complete signals EUcO and EUcl, respectively, to the 
AND gate 378. 

fIig. 5 shows the formats of instruction 

supplied to the parallel processors of 
Embodiment, Each instruction word is made 
r more basic instructions EI and at least 
ruction word delimiting fields 0 and 1. 
nstruction word length is either 1 or 2, 
ow of FIG. 5 indicates an instruction 

a basic instruction word length of 2, 
of a basic instruction word made up of an 
word delimiting field 0 and a basic 
EI, and another basic instruction word 
an instruction word delimiting filed 1 
instruction EI, The lower row of FIG. 5 
indicates bn instruction word having a basic 
instruction word length of 1, consisting of an 
instruction word delimiting field 1 and a basic 
instructiop EI . 

The above instruction words are stored in 
the memory 12 in advance. The adder 324 in the 



words to be 
the first 
up of one 
one of ins 
The basic 
The upper 
word havinJg 
consisting 
instruction 
instruction 
made up of 
and a basib 
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instruction fetch unit 46 of the parallel processor 
20 increments the address by a fixed length DISP, so 
that the instruction words can be fetched from the 
memory 12 in order. When the cutting unit 316 in 
5 the instruction fetch unit 46 fetches the 

instruction word of the upper row of FIG. 5, for 
instance, it recognizes the instruction word 
delimiting field and the following basic instruction 
EI as one instruction word. The cutting unit 316 
0 then cuts the instruction word from the instruction 
word string, and stores it in the instruction fetch 
unit 46. The adder 325 calculates the address 
corresponding to the basic instruction EI to be 
executed in accordance with an instruction word 
5 length signal SL supplied from the cutting unit 316. 
The calculated address is temporarily stored in the 
EPC 339. A return address for rerunning the basic 
instruction EI that is stored in the EPC 339 is 
supplied to the register unit 98- 
0 Base^ on the instruction word delimiting 

fields 0 and y contained in the instruction words 
supplied f rony the cutting unit 316, the instruction 
issue unit li. recognizes each basic instruction EI, 
and issues ^aidh. basic instruction EI selectively to 
5 one of the /instruction execution units EUG and EUl 
via the se6-ectors 355 and 356. Accordingly, if a 
basic ins/truction EI following an instruction word 
delimiting field 0 is issued to the instruction 
execution unit EUO , while a basic instruction EI 
0 followjjng an instruction word delimiting field 1 is 
issued/to the instruction execution unit EUl. The 
selectors 355 and 356 are controlled by the control 
unit B70. When the execution of one instruction 
word /is completed, the corresponding basic 
5 instruction EI is supplied to the instruction 

execution units EUO and EUl via the selectors 355 
and /356 . 



-11- 



Likewise, in a case where the instruction 
fetch unit 46 fetches and then supplies the 
instruction word having the basic instruction word 
length of 1 to the instruction buffer unit 308, the 
5 cutting unit 316 cuts the basic instruction EI that 
follows the instruction word delimiting field 1 from 
the rest of the instruction word. The instruction 
register 347 then issues the basic instruction EI to 
one of the instruction execution units EUO and EUl . 
10 The instruction word delimiting fields 0 

and 1 are both represented by one bit, but any sort 
of data can be written in those fields as long as 

O they can function to delimit the instruction words. 

In this example, the two instruction execution units 

Ul 15 EUO and EUl having the same structure are employed, 

;F but it is also possible to employ three or more 

^£ instruction execution units. 

S| As described so far, in the parallel 

L, processor 20 of this example, the instruction fetch 

i.^ 20 unit 46 fetches instruction words one by one in 

P accordance with the instruction word delimiting 

fields 0 and 1, so that the length of each of the 
O instruction words can be made variable. The 

instruction issue unit 72 then issues a basic 
25 instruction EI to a corresponding one of the 

instruction execution units EUO and EUl. 

Accordingly, there is no need to include do-nothing 

instructions NOP in any instruction word, and basic 

instructions EI can be efficiently included in each 
30 instruction word. By executing the basic 

instructions EI in the instruction words, the 

parallel processing performance of the parallel 

processor can be improved . 

(Example 2) 

35 FIG. 6 shows the structure of a second 

example of the parallel processor 21 in accordance 
with the first embodiment of the present invention. 
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As shown in FIG, 6, the parallel processor 21 has 
the same structure as the parallel processor 20 
shown in FIG. 3, except for a judgment unit 103 that 
determines whether or not each basic instruction EI 
5 supplied to the instruction issue unit 73 has data 
dependence or control dependence with a basic 
instruction EI being executed by one of the 
instruction execution units EUO and EUl , and whether 
or not each basic instruction EI shares one resource 
10 with another basic instruction EI being executed by 
one of the instruction execution units EUO and EUl . 

The judgment unit 103 compares a 
destination register number (write register number) 
defined in a basic instruction EI in execution with 
15 a source register number (read register number) 

defined in a basic instruction EI to be issued to 
Si one of the instruction execution units EUO and EUl. 



a !,s 
p. 



N If the destination register number coincides with 

the source register number, it is confirmed that 
20 there is data dependence between the two basic 
p instructions EI. If the destination register number 

does not coincide with the source register number, 
Q it is confirmed that there is no data dependence 

between the two basic instructions EI, and the 
25 operation can proceed. 

The judgment unit 103 also determines 
whether or not the basic instruction EI in execution 
contains a branch instruction, and whether or not 
the basic instruction EI has a possibility of 
30 starting an irregular process such as a division by 
0. If the basic instruction EI in execution 
contains a branch instruction or has a possibility 
of an irregular process, there is control dependence 
between the basic instruction EI in execution and 
35 the basic instruction EI to be issued to the 

instruction execution unit EUO or EUl. If the basic 
instruction EI in execution neither contains a 
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branch instruction nor has a possibility of an 
irregular process, it is confirmed that there is no 
control dependency between the two basic 
instructions EI. 
5 Based on the contents of each basic 

instruction EI , the judgment unit 103 also compares 
the resource (the instruction execution units EUO 
and EUl , for instance) required by the basic 
instruction EI in execution with the resource 
10 required by the basic instruction EI to be issued. 

If the resource required by the basic instruction EI 
in execution is the same as the resource required by 
the basic instruction EI to be issued, there is 
resource sharing between the two basic instructions 
yi 15 EI. If the resources are different, it is confirmed 

that there is no resource sharing between the two 
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basic instructions EI. 

If the basic instruction EI to be issued 
has neither data dependency nor control dependency, 
20 and causes no resource sharing with the basic 
O instruction EI being executed by the instruction 

^ execution units EUO and EUl, the instruction issue 

ri unit 73 issues the basic instruction EI to a 

corresponding one of the instruction execution units 
25 EUO and EUl before the end of the execution. Here, 
the instruction issuance by the instruction issue 
unit 73 and the instruction execution by the 
instruction execution units EUO and EUl are 
processed by time-sharing parallel processing. 
30 On the other hand, if the basic 

instruction EI to be issued has data dependency 
and/or control dependency, and/or causes resource 
sharing with the basic instruction EI being executed 
by the instruction execution units EUO and EUl, the 
35 basic instruction EI is issued to a corresponding 
one of the instruction execution units EUO and EUl 
after the end of the execution. 
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Although the two instruction execution 
units EUO and EUl having the same structure are 
employed in this example, it is also possible to 
employ three or more instruction execution units. 

described so far, the parallel 
processor 21 o^-^-^tliis example can have the same 
effects as the parai^^-^l processor 20 of Example 1, 
and efficiently and accur^^t^y performs the parallel 
processing of the basic instrufc<ions EI Thus, more 
reliable operations can be achieve? 



in 



in 
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[ Second Embodiment ] 

FIGS. 7, 12, 14, 16, 18, and 20 show 
parallel processors 22 to 27 in accordance with a 

15 second embodiment of the present invention. Each of 
the parallel processors 22-27 comprises an 
instruction fetch unit 48-53 connected to a memory 
12, an instruction issue unit 74-79 connected to the 
instruction fetch unit 48-53, instruction execution 

20 units LUO, lUO , lUl , FUO , FUl , and BUO connected to 
the instruction issue unit 74-79, and a register 
unit 99 connected to all the instruction execution 
units LUO, lUO, lUl, FUO, FUl, and BUO. 

The instruction execution unit LUO is a 

25 load store instruction execution unit that executes 
a load instruction and a store instruction. After 
the execution of these instructions, the instruction 
execution unit LUO notifies the instruction issue 
unit 74-79 of the end of the execution. The 

30 instruction execution units lUO and lUl are integer 
arithmetic instruction execution units that execute 
integer arithmetic instructions. When the execution 
of the integer arithmetic instructions is completed, 
the instruction execution units lUO and lUl notify 

35 the instruction issue unit 74-79 of the end of the 
execution . 

The instruction execution units FUO and 
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FUl are floating-point arithmetic instruction 
execution units that execute floating-point 
arithmetic instructions. When the execution of the 
floating-point arithmetic instructions is completed, 
5 the instruction execution units FUO and FUl notify 
the instruction issue unit 74-79 of the end of the 
execution. The instruction execution unit BUO is a 
branch instruction execution unit that executes a 
branch instruction. When the execution of the 
10 branch instruction is completed, the instruction 
execution unit BUO notifies the instruction issue 
unit 74-79 of the end of the execution. 

In the following examples, the maximum 
basic instruction word length contained in one 
m 15 instruction word is 2, but the same effects can be 

expected in a case where the maximum basic 

In 

~; instruction word length is 3 or greater. 

SI (Example 1) 

FIG. 7 shows the structure of a first 
2 20 example of the parallel processor in accordance with 

O the second embodiment of the present invention. As 

^ shown in FIG. 7, the parallel processor 22 comprises 

Q a conversion unit 115 in the instruction fetch unit 

48. The conversion unit 115 rearranges basic 
25 instructions contained in one fetched instruction 
word in accordance with the structure of the 
instruction execution units LUO , lUO, lUl , FUO, FUl, 
and BUO, and then supplies the rearranged basic 
instructions to the instruction issue unit 74. This 
30 rearrangement by the conversion unit 115 facilitates 
the issuance of the basic instructions of the 
instruction issue unit 74. 

More specifically, the parallel processor 
of the present invention is embodied on a printed 
35 board or an LSI circuit. The components are 

arranged on a two-dimensional surface and connected 
by wires. At this point, the wires might cross each 



other. However, a printed board and an LSI circuit 
have a plurality of wiring layers , so that any two 
wires that might cross each other can be arranged on 
two different wiring layers. Logically, it is 
possible to place wires in any desired arrangement. 
In view of the operation speed of the circuit, 
however, the above alternate wiring (arranging wires 
on different wiring layers) requires longer wires, 
which will decrease the operation speed. Therefore, 
it is preferable to have less alternate wiring. 
Shorter wires will facilitate the issuance of the 
basic instruction of the instruction issue unit 74, 
and increase the operation speed. 

FIG. 8 shows the structures of the 
instruction fetch unit 48 and the instruction issue 
unit 74 of the parallel processor 22 shown in FIG. 7 
The instruction fetch unit 48 and the instruction 
issue unit 74 have the same structures as the 
instruction fetch unit 46 and the instruction issue 
unit 72 shown in FIG. 4, except that the instruction 
fetch unit 48 includes the conversion unit 115 
connected to a cutting unit 317. The instruction 
execution unit BUO supplies information, such as a 
branch destination address corresponding to a branch 
instruction, to a FPC 301. 

r simplification of the drawing, only 
two instructioif^passages from an instruction 
register 348 to tfr^^two instruction execution units 
LUO and LUl are shownX^ FIG. 8. However, it should 
be understood that there^^^e the other instruction 
passages to the instructionS^ecution units lUl, FUO 
FUl, and BUO, as shown in FIG?^^. 

The parallel processor 22 of this example 
operates in the following manner. First, the 
cutting unit 317 of the instruction fetch unit 48 
fetches instruction words one by one. The formats 
13 of the instruction words to be supplied to the 
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instruction fetch unit 48 are shown in FIG. 9. As 
shown in FIG. 9, each of the instruction words 
includes an instruction word delimiting field 0 
and/or an instruction word delimiting field 1 and 
5 one or two instructions selected from the group 

consisting of an integer arithmetic instruction II, 
a floating-point arithmetic instruction FI, a load 
store instruction LI, and a branch instruction BI . 
An interface 15 for the instruction 

10 execution units LUO, lUO , lUl , FUO , FUl and BUO, 

includes effective bits V, information II required 
for executing an integer arithmetic instruction, 
information FI required for executing a floating- 
point arithmetic instruction, information LI 

15 required for executing a load store instruction, and 
information BI required for executing a branch 
instruction. The interface 15 supplies the 
effective bit V and the information LI from the 
instruction issue unit 74 to the instruction 

20 execution unit LUO , the effective bit V and the 

information II to the instruction execution units 
lUO and lUl , the effective bit V and the information 
FI to the instruction execution units FUO and FUl, 
and the effective bit V and the information BI to 

25 the instruction execution unit BUO . 

When the effective bit V is 0 , no basic 
instruction is issued, and when the effective bit 1, 
a basic instruction is issued. Each effective bit V 
is coupled with the information II, FI, LI, or BI , 

30 and is then allocated to each corresponding 
instruction execution unit. 

As shown in FIG. 9, the instruction word 
formats 13 are rearranged and converted into 
instruction word formats 17 by the conversion unit 

35 115 in the instruction fetch unit 48. The 

instruction word formats 17 correspond to the 
instruction execution units LUO, lUO, lUl, FUO, FUl, 
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and BUO , and are supplied to the instruction 
register 348 in the instruction issue unit 74. The 
instruction register 348 issues basic instructions 
each having the effective bit V of 1 to 
corresponding instruction execution units. For 
instance, when the instruction word on the uppermost 
row of the instruction word format 17 is supplied to 
the instruction issue unit 74, the instruction issue 
unit 74 issues the floating-point arithmetic 
instruction FI provided with "1" as the effective 
bit V to the instruction execution unit FUO , and the 
branch instruction BI also provided with "1" as the 
effective bit V to the instruction execution unit 
BUO . 

As a result, the instruction execution 
unit FUO executes the floating-point arithmetic 
instruction FI, and the instruction execution unit 
BUO executes the branch instruction BI. In this 
case, no basic instructions are executed by the 
other instruction execution units LUO, lUO , lUl , and 
FUl . 

f/eg, 10 is a circuit diagram of the 
conversion /unit 115 shown in FIG. 8. As shown in 
FIG. 10, tme conversion unit 115 comprises 
transmissoJon lines LI and L2, BI detectors BDl and 
BD2 , FI detectors FDl and FD2 , II detectors IDl and 
ID2, LI detectors LDl and LD2 , buffers 155 to 158, 
AND gates 163 to 166, 185, and 186, exclusive OR 
gates isb to 190 selectors 209 to 212, and OR gates 
199 to 302. 

The transmission line LI transmits the 
first basic instruction contained in each 
instruction word, and the transmission line L2 
transmits the second basic instruction contained in 
each instruction word. The BI detector BDl is 
connected to the transmission line LI, and the BI 
detector BD2 is connected to the transmission line 
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L2. The buffer 155 is connected to the BI detector 
BDl, and the AND gate 163 is connected to the BI 
detectors BDl and BD2. The selector 209 is 
connected to the transmission lines LI and L2, the 
buffer 155, and the AND gate 163. The OR gate 199 
is connected to the buffer 155 and the AND gate 163. 

^e FX detector FDl is connected to the 
transmissioiK line LI, and the FX detector FD2 is 
connected to r>he transmission line L2 . The buffer 

156 is connected to the FX detector FDl, and the AND 
gate 164 is connected to the FX detectors FDl and 
FD2 . The two inpirt terminals of the exclusive OR 
gate 187 are connected to the input node and the 
output node, respect^ely, of the buffer 156. The 
two input terminals ofV the exclusive XR gate 188 are 
connected to the outputXnode of the AND gate 164 and 
the FX detector FD2 , respectively. The AND gate 185 
is connected to the two exclusive OR gates 187 and 
188. The selector 210 is cWinected to the 
transmission lines LI and L2/v the buffer 156, and 
the AND gate 164. The OR gate\200 is connected to 
the buffer 156 and the AND gate y64. 

The XX detector XDl is connected to the 
transmission line LI, and the XX detector XD2 is 
connected to the transmission line L2. The buffer 

157 is connected to the XX detector XDl, and the AND 
gate 165 is connected to the XX detectors XDl and 
XD2. The two input terminals of the exclusive OR 
gate 189 are connected to the input node and the 
output node, respectively, "to the buffer 157. The 
two input terminals of the exclusive OR gate 190 are 
connected to the output node of the AND gate 165 and 
the XX detector XD2 , respectively. The AND gate 186 
is connected to the two exclusive OR gates 189 and 
190. The selector 211 is connected to the 
transmission lines LI and L2, the buffer 157, and 
the AND gate 165. The OR gate 201 is connected to 
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the buffer 157 and the AND gate 165. 

The LI detector LDl is connected to the 
transmission line LI, and the LI detector LD2 is 
connected to the transmission line L2 . The buffer 
5 158 is connected to the LI detector LDl, and the AND 
gate 166 is connected to the LI detectors LDl and 
LD2. The selector 212 is connected to the 
transmission lines LI and L2, the buffer 158, and 
the AND gate 166. The OR gate 202 is connected to 
10 the buffer 158 and the AND gate 166. 

The two BI detectors BDl and BD2 
constitute a BI detector block 147. The two FI 
O detectors FDl and F2 constitute an FI detector block 

^' 149. The two II detectors IDl and ID2 constitute an 

yi 15 II detector block 151- The two LI detectors LDl and 

^ LD2 constitute an LI detector block 153. 

5 s s 

^1 In the following, an operation of the 

SJ conversion unit 115 having the above structure will 

L, be described by way of an example case where the 

20 instruction word including the basic instructions BI 
O and FI on the uppermost row of the instruction word 

^ formats 13 shown in FIG. 9 is supplied to the 

5 conversion unit 115. First, the basic instruction 

BI is transmitted through the transmission line LI . 
25 The BI detector BDl then detects the basic 

instruction BI and supplies a detection signal of 
logic 1 to the buffer 155. At this point, the AND 
gate 163 outputs a logic 0 signal. In accordance 
with the detection signal supplied from the buffer 
30 155, the selector 209 selects the first basic 
instruction BI and outputs the first basic 
instruction BI, that is, an instruction to be 
executed by the instruction execution unit BUO , to 
the instruction issue unit 74. At the same time as 
35 the output of the basic instruction BI , in 

accordance with the detection signal supplied from 
the buffer 155, the OR gate 199 outputs the 



effective bit V of logic 1. As the first basic 
instruction BI is detected, the FI detector FDl, the 
II detector IDl, and the LI detector LDl output non- 
detection signals of logic 0. Accordingly, the 
selectors 210, 211, and 212 do not select the first 
basic instruction transmitted through the 
transmission line LI. 

Next, the second basic instruction FI in 
the instruction word is transmitted through the 
transmission line L2. As in the case of the first 
basic instruction BI , The FI detector FD2 detects 
the second basic instruction FI and supplies a 
detection signal of logic 1 to the AND gate 164. 
The AND gate 164 in turn outputs a logic 1 signal. 
In accordance with the logic 1 signal supplied from 
the AND gate 164, the selector 210 selects the 
second basic instruction FI and outputs the second 
basic instruction FI as an instruction to be 
executed by the instruction execution unit FUO . At 
the same time as the output of the basic instruction 
FI, the OR gate 200 outputs the effective bit V of 
logic 1 in accordance with the detection signal 
supplied from the AND gate 164. 

^Vs the second basic instruction FI is 
detected, the^^ detector BD2, the II detector ID2, 
and the LI detecr'^^r LD2 output non-detection signals 
of logic 0. Accord3^B<gly , the selectors 209, 211, 
and 212 do not select tlfi^ second basic instruction 
transmitted through the tr^smission line L2. Since 
neither first nor second bas^sc instructions to be 
executed by the instruction execruted units LUO , lUO , 
lUl , and FUl are detected, the ef^ctive bet V of 
logic 0 is outputted from each of th^ OR gates 201 
and 202, and the AND gates 185 and ISfet. 

In the above described manner, the 
conversion unit 115 converts the instruction word 
formats 13 into the instruction word formats 17, as 



shown in FIG. 9. 

FIG. 11 is a circuit diagram of the 
conversion unit 115 in a case where the maximum 
basic instruction word length of one instruction 
word to be supplied from the memory 12 to the 
instruction fetch unit 48 is 4. As shown in FIG. 11 
the structure of the conversion unit 115 in this 
case is the same as the structure of the conversion 
unit 115 shown in FIG. 10, except that the number of 
transmission lines are 4, the number of BI detiectors 
is 4, the number of FI detectors is 4, the number of 
II detectors is 4, and the number of LI detectors is 
4. Also, two selectors 214 and 215 are provided for 
a basic instruction FI, and two selectors 216 and 
217 are provided for a basic instruction II in this 
case , 

The conversion unit 115 further includes 
buffers 159 to 162, AND gates 167 to 184, exclusive 
OR gates 191 to 198, OR gates 203 to 208, and 
selectors 213 and 218. The four BI detectors BDl to 
BD4 constitute a BI detector block 148. The four FI 
detectors FDl to FD4 constitute an FI detector block 
150. The four II detectors IDl to ID4 constitute an 
ID detector block 152. The four LI detectors LDl to 
LD4 constitute an LI detector block 154. 

"^Ij^ conversion unit 115 having the above 
tructure operates in the same manner as the 
conversion unit ris5 shown in FIG. 10. In the 
following, an operata>SLn of the conversion unit 115 
in a case where an instrtK^tion word made up of basic 
instructions BI, FI , FI , an^s^II is supplied to the 
conversion unit 115 will be des^&ribed. First, the 
first basic instruction BI is tranbqiitted through 
the transmission line LI. The BI detfe<^or BDl then 
detects the basic instruction BI and sup^^^i^s a 
detection signal of logic 1 to the buffer r^9. At 
this point, each of the AND gates 167 to 169 outputs 



-23- 

Cl.^^^\\ t& logic 0 signal. In accordance with the detection 

sa>anal supplied from the buffer 159, the selector 
213 sheets the first basic instruction BI and 
outputsNfche first basic instruction BI, that is an 
5 instruction to be executed by the instruction 

execution unit BUO , to the instruction issue unit 74 
At the same ti^e as the output of the first basic 
instruction BI , ^^e OR gates 203 outputs the 
effective bit V orVLogic 1 in accordance with the 
10 detection signal supplied from the buffer 159. As 
the first basic instruction BI is detected, the FI 
detector FDl, the II de\fictor IDl, and the LI 
Q detector LDl output non-aVBtection signal of logic 0. 

^ Accordingly, the selectors >214 , 216, and 218 do not 

in 15 select the first basic instrW:tion BI transmitted 

^ through the transmission line^^^l. 

Ul 

fil Next, the second basic instruction FI is 

SI transmitted on the transmission line L2. The FI 

detector FD2 then detects the second basic 
^ 20 instruction FI and supplies a detection signal of 

O logic 1 to the AND gate 170, The AND gate 170 in 

J turn outputs a logic 1 signal. In accordance with 

~ the logic 1 signal supplied from the AND gate 170, 

the selector 214 selects the second basic 
25 instruction FI and outputs the second basic 

instruction FI as an instruction to be executed by 
the instruction execution unit FUO . At the isame 
time as the output of the second basic instruction 
FI, the OR gate 204 outputs the effective bit V of 
30 logic 1 in accordance with the detection signal 
supplied from the AND gate 170. 

As the second basic instruction FI is 
detected, the BI detector BD2, the II detector ID2, 
and the LI detector LD2 each output a non-detection 
35 signal of logic 0. Accordingly, the selectors 213, 
216, and 218 do not select the second basic 
instruction FI transmitted through the transmission 
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line L2. 

Next, the third basic instruction FI is 
transmitted through the transmission line L3. The 
FI detector FD3 then detects the third basic 
5 instruction FI and supplies a detection signal of 
logic level 1 to the AND gate 171. Since the AND 
gate 171 has already received a detection signal of 
logic 1 from the FI detector FD2 at this point, the 
output of the AND gate 171 is a logic 0 signal. 
10 Because of that, the exclusive OR gate 193 outputs a 
logic 1 signal, and the AND gate 174 also outputs a 
logic 1 signal. In accordance with the logic 1 
O signal supplied from the AND gate 174, the selector 

^ 215 selects the third basic instruction FI and 

in 15 outputs the third basic instruction FI as an 

=t instruction to be executed by the instruction 

in 

fll execution unit FUl . At the same time as the output 

S| of the third basic instruction FI, the OR gate 205 

outputs the effective bit V of logic 1 in accordance 
20 with the signal supplied from the AND gate 174. 

As the third basic instruction FI is 
detected, the BI detector BD3 , the II detector ID3, 
and the LI detector LD3 each output a non-detection 
signal of logic 0. Accordingly, the selectors 213, 
25 216, and 218 do not select the third basic 

instruction FI transmitted through the transmission 
line 3 . 

Next, the fourth basic instruction II of 
the instruction word is transmitted through the 
30 transmission line L4. The II detector ID4 then 

detects the fourth basic instruction II and supplies 
a detection signal of logic 1 to the AND gate 178. 
The AND gate 178 in turn outputs a logic 1 signal. 
In accordance with the logic 1 signal supplied from 
35 the AND gate 178, the selector 216 selects the 

fourth basic instruction II and outputs the fourth 
basic instruction II as an instruction to be 



9 
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executed by the instruction execution unit lUO . At 
the same time as the output of the fourth basic 
instruction II, the OR gate 206 outputs the 
effective bit V of logic 1 in accordance with the 
5 signal supplied from the AND gate 178. 

As described above, in the parallel 
processor of this example, basic instructions 
contained in each instruction word supplied to the 
instruction fetch unit 48 are rearranged in 
10 accordance with the arrangement of the instruction 
execution units, so that the instruction issue unit 
74 can smoothly issue the basic instructions to the 
respective instruction execution units. Thus, the 
entire operation speed can be increased. 
15 In this example, the instruction fetch 

unit 48 canValso fetch an instruction word 
containing basic instructions that have already been 
arranged in accordance with the arrangement of the 
instruction execution units in advance. In such a 
20 case, the basia instruction are arranged in advance 
so that the circuit size required for rearranging 
the basic instrii|ctions in the instruction fetch unit 
4 8 can be reduced. 

More specifically, when there are two 
25 instructions for the same function, only one of the 
two instructions is employed. For instance, the 
instruction word on the uppermost row and the 
instruction word on the fourth row from the top of 
the formats 13 in FIG. 9 are rearranged into the 
30 same formats in the formats 17. In this case, only 
one of the two instruction words should be employed, 
while the use of the other should be inhibited. 
Alternatively, an instruction word that will 
increase the number of alternate wire routes in the 
35 instruction fetch unit 48 may be inhibited 

beforehand. For instance, the instruction words on 
the upper most row and the fourth row from the top 
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of the formats 13 in FIG, 9 have the basic 
instructions BI and FI in the opposite orders. 
Since the circuit components are arranged on a two- 
dimensional surface, one of the two basic 
5 instructions requires more alternate wire routes 
than the other. Accordingly, the instruction word 
that requires more alternate wire routes should be 
inhibited in advance. 

As described so far, the circuit size of 
10 the parallel processor 22 can be reduced by 

restricting in advance the arrangement of basic 
instruction contained in each instruction word to be 
O supplied to the instruction fetch unit 48. 

^ (Example 2) 

m 15 FIG. 12 shows the structure of a second 

=F example of the parallel processor in accordance with 

the second embodiment of the present invention. As 

SJ shown in FIG. 12, the parallel processor 23 of this 

example has the same structure as the parallel 
20 processor 22 of Example 1, except that a conversion 

O unit 116 is included in the instruction issue unit 

75. The conversion unit 116 has the same structure 

O and functions as the conversion unit 115 shown in 

FIGS. 10 and 11. 
25 FIG. 13 shows the structures of the 

instruction fetch unit 49 and the instruction issue 
unit 75 of the parallel processor 23 shown in FIG. 
12. The instruction fetch unit 49 and the 
instruction issue unit 75 has the same structures as 
30 the instruction fetch unit 46 and the instruction 
issue unit 72 shown in FIG. 4, except that the 
instruction issue unit 75 includes the conversion 
unit 116 connected to an instruction register 349. 
For simplification of the drawing, only the 
35 instruction passages to the two instruction 

execution units LUO and lUO are shown, and the 
instruction passages to the other instruction 



execution units lUl, FUO , FUl , and BUO are omitted 
in FIG. 13. Also, only two execution complete 
signals LUc and lUcO to be supplied to the AND gate 
380 are shown, and the other execution complete 
signals are omitted in FIG. 13. 

With the parallel processor of this 
example, basic instructions contained in each 
instruction word supplied from the instruction 
register 349 are rearranged by the conversion unit 
116 in accordance with the arrangement of the 
instruction execution units. The rearranged basic 
instructions are then issued to the corresponding 
instruction execution units. Thus, the wires can be 
shortened as a whole, and the operation speed can be 
increased. 

Also, the arrangement of basic instruction 
contained in each instruction word to be supplied to 
the instruction fetch unit 49 can be restricted in 
advance in the same manner as in Example 1. Thus, 
the circuit size of the parallel processor 23 can be 
reduced . 
( Example 3 ) 

FIG. 14 shows the structure of a third 
example of the parallel processor in accordance with 
the second embodiment of the present invention. As 
shown in FIG. 14, the parallel processor 24 has the 
same structure as the parallel processor 22 of 
Example 1 shown in FIG. 7, except that the 
instruction fetch unit 50 includes a first 
conversion unit 117 and the instruction issue unit 
76 includes a second conversion unit 118. The first 
conversion unit 117 rearranges basic instructions 
contained in each instruction word in accordance 
with the arrangement of the instruction execution 
units. The second conversion unit 118 rearranges 
basic instructions contained in each instruction 
word in accordance with the arrangement of the 
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instruction execution units. 

FIG. 15 shows the structures of the 
instruction fetch unit 50 and the instruction issue 
unit 76 of the parallel processor unit 24 shown in 
5 FIG. 14. The instruction fetch unit 50 and the 

instruction issue unit 76 have the same structures 
as the instruction fetch unit 46 and the instruction 
issue unit 72 shown in FIG, 4, except that the 
instruction fetch unit 50 further includes the first 
10 conversion unit 117 connected to a cutting unit 319 
and the instruction issue unit 76 further includes 
the second conversion unit 118 connected to an 

0 instruction register 350. For simplification of the 

^ drawing, only the instruction passages from the 

y I 

yl 15 second conversion unit 118 to the two instruction 
^ execution units LUO and lUO are shown, and the 

S'h instruction passages to the other instruction 

1 ^ 

%j execution units lUl, FUO , FUl , and BUO are omitted 

^ in FIG. 15. Likewise, only two execution complete 

2 20 signals LUc and lUcO to be supplied to the AND gate 
O 381 are shown, and the other execution complete 

^ signals are omitted in FIG. 15. 

S The first conversion unit 117 performs 

"preprocessing" of the rearrangement of basic 

25 instructions. The second conversion unit 118 

performs ''postprocessing" of the rearrangement of 
basic instructions . 

In an actual circuit, the processes 
performed by the instruction fetch unit 50 and the 

30 instruction issue unit 76 are pipelined so as to 

improve the performance of the parallel processor. 
Because of that, the difference in processing time 
between instruction fetch unit 50 and the 
instruction issue unit 76 should be as small as 

35 possible to optimize the pipeline effects. 

Therefore, the arrangement process is divided into 
the "preprocessing" and "postprocessing", so that 
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the difference in processing time between the 
instruction fetch unit 50 and the instruction issue 
unit 76 is small. 

More specifically, the first conversion 
5 unit 117 includes circuits that are the counterparts 
of the BI detector block 147 or 148, the FX detector 
block 149 or 150, the II detector block 151 or 152, 
and the LI detector block 153 or 154 shown in FIGS. 
10 and 11, The other circuits shown in FIGS. 10 and 
10 11 are included in the second conversion unit 118. 

With the parallel processor 24 having the 
above structure, the wires can be shortened as a 
whole, and the operation speed can be reduced. 

Also, as in Examples 1 and 2, the circuit 
15 size of the parallel processor 24 may be reduced by 
restricting in advance the arrangement of basic 
instructions contained in each instruction word to 
be supplied to the instruction fetch unit 50. 
( Example 4 ) 

20 FIG. 16 shows the structure of a fourth 

example of the parallel processor in accordance with 
the second embodiment of the present invention. As 
shown in FIG. 16, the parallel processor 25 has the 
same structure as the parallel processor 22 of 

25 Example 1 shown in FIG. 7, except that the 

instruction fetch unit 51 includes a conversion unit 
119 and the instruction issue unit 77 includes a 
judgment unit 104, 

FIG. 17 shows the structures of the 

30 instruction fetch unit 51 and the instruction issue 
unit 77 of the parallel processor 25 shown in FIG. 
16. The instruction fetch unit 51 and the 
instruction issue unit 77 have the same structures 
as the instruction fetch unit 48 and the instruction 

35 issue unit 74 shown in FIG. 8, except that the 
instruction issue unit 77 further includes the 
judgment unit 104, The judgment unit 104 determines 
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whether or not a basic instruction to be issued has 
data dependency or control dependency with a 
supplied basic instruction. The judgment unit 104 
also determines whether or not the basic instruction 
to be issued shares resources with the supplied 
basic instruction. If the basic instruction to be 
issued has data dependency or control dependency, or 
shares resources with the supplied basic instruction 
the instruction issue unit 77 issues the basic 
instruction after the execution complete signals LUc 
and lUcO are supplied. 

Aor simplification of the drawing, only 
the instrucision passages from an instruction 
register 35l\to the two instruction execution units 
LUO and lUO ^e shown, and the other instruction 
passages to tne instruction execution units lUl, FUO 
FUl , and BUO are omitted in FIG, 17. Likewise, only 
the two executj^n complete signals LUc and lUcO are 
shown as signals to be supplied to the Judgment unit 
104, but the other execution complete signals are 
omitted in FIG. 1^7. 

The structure and operation of the 
conversion unit 119 are substantially the same as 
the structure and operation of the conversion unit 
15 shown in FIGS. 10 and 11. The structure and 
operation of the judgment unit 104 are substantially 
the same as the structure and operation of the 
judgment unit 103 shown in FIG. 6. 

By the parallel processor of this example 
having the above structure, the same effects as 
obtained by the parallel processor of Example 2 of 
the first embodiment and the parallel processor of 
Example 1 of the second embodiment can be obtained. 
In the parallel processor of this example, the 
instruction issue unit 77, which includes the 
judgment unit 104, enables accurate and efficient 
parallel processing of basic instructions, thereby 
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increasing the reliability of the parallel processor. 
Also, the instruction fetch unit 51, which includes 
the conversion unit 119, facilitates the basic 
instruction issuance to the instruction execution 
5 units by the instruction issue unit 77, thereby 
increasing the operation speed. 

As in the foregoing examples, the circuit 
size of the parallel processor 25 may be reduced by 
restricting in advance the arrangement of basic 
10 instructions contained in each instruction word to 
be supplied to the instruction fetch unit 51. 
(Example 5) 

□ FIG. 18 shows the structure of a fifth 

example of the parallel processor in accordance with 

y I 

Ln 15 the second embodiment of the present invention. As 

=P shown in FIG. 18, the parallel processor 26 has the 

^! same structure as the parallel processor 25 of 

S| Example 4, except that the instruction fetch unit 52 

1^ includes no conversion unit and the instruction 

20 issue unit 78 further includes a conversion unit 120. 

□ FIG. 19 shows the structures of the 

^ instruction fetch unit 52 and the instruction issue 

Q unit 78 of the parallel processor 26 shown in FIG. 

18. The instruction fetch unit 52 and the 
25 instruction issue unit 78 have the same structures 

as the instruction fetch unit 49 and the instruction 
issue unit 75 shown in FIG. 13, except that the 
instruction issue unit 78 further includes the 
judgment unit 105 connected between an instruction 
30 register 352 and a control unit 375. In accordance 
with a supplied basic instruction, the Judgment unit 
105 determines whether or not a basic instruction to 
be issued has the data dependency or control 
dependency, and whether or not the basic instruction 
35 to be issued will cause resource sharing. The 

judgment results are reported to the control unit 
375. If the basic instruction to be issued has the 
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data dependency or control dependency, or causes 
resource sharing, the Issue instruction unit 78 
issues the basic instruction after the supply of the 
execution complete signals LUc and lUcO . 
5 For simplification of the drawing, only 

the instruction passages from the instruction 
register 352 to the two instruction execution units 
LUO and lUO are shown, and the instruction passages 
to the other instruction execution units are omitted 
10 in FIG. 19. Likewise, only the two execution 

complete signals LUc and lUcO to be supplied to the 
judgment unit 105 are shown in FIG. 19. 

The structure and operation of the 
conversion unit 120 are the same as the structure 
"f| 15 and operation of the conversion unit 115 shown in 

P FIGS. 10 and 11. The structure and operation of the 

judgment unit 105 are the same as the structure and 
operation of the judgment unit 104 shown in FIG. 16. 

The parallel processor of this example 
20 having the above structure achieves the same effects 
as the parallel processor of Example 4. The 
instruction issue unit 78 including the judgment 
unit 105 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing 
25 the reliability of the operation. Also, the 

instruction issue unit 78, which further includes 
the conversion unit 120, facilitates the issuance of 
basic instructions to the instruction execution 
units . 

30 Additionally, the circuit size of the 

parallel processor 26 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 52, as in the foregoing 
3 5 examples . 

(Example 6) 

FIG. 20 shows the structure of a sixth 
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example of the parallel processor in accordance with 
the second embodiment of the present invention. As 
shown in FIG. 20, the parallel processor 27 has the 
same structure as the parallel processor 24 of 
5 Example 3 shown in FIG. 14, except that the 
instruction issue unit 79 further includes a 
judgment unit 106. 

FVJG. 21 shows the structures of the 
instruction ^etch unit 53 and the instruction issue 
0 unit 79 of tire parallel processor 27 shown in FIG. 
20. The instruction fetch unit 53 and the 
instruction isi^e unit 79 have the same structures 
as the instruction fetch unit 50 and the instruction 
issue unit 76 shown in FIG. 15, except that the 
5 instruction issueVinit 79 further includes the 

judgment unit 106 oonnected between an instruction 
register 353 and a Csontrol unit 376. Based on a 
supplied basic instruction, the judgment unit 106 
determines whether orXnot a basic instruction to be 
0 issued has the data d^endency or control dependency, 
or causes resource shaiVng. The judgment results 
area reported to the cor^trol unit 376. If the basic 
instruction to be issued ^as the data dependency or 
control dependency, or cai^es resource sharing, the 
5 instruction issue unit 79 :^sues the basic 

instruction only after the execution complete 
signals LUc and lUcO are suppQ-ied. 

For simplification of the drawing, only 
the instruction passages from the instruction 
0 register 353 to the two instruction execution units 
LUO and lUO are shown, and the instruction passages 
to the other instruction execution units lUl , FUO , 
FUl , and BUO are omitted in FIG. 21. Likewise, only 
the two execution complete signals LUc and lUcO are 
5 shown in FIG. 21. 

The structures and operations of a first 
conversion unit 121 and a second conversion unit 122 
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are the same as the structures and operations of the 
first conversion unit 117 and the second conversion 
unit 118. The structure and operation of the 
judgment unit 106 are the same as the structure and 

5 operation of the Judgment unit 103 shown in FIG. 6. 

The parallel processor 27 of this example 
having the above structure can achieve both effects 
of the parallel processor of Example 2 of the first 
embodiment and the parallel processor of Example 3 

0 of the second embodiment. More specifically, the 
instruction issue unit 79 including the judgment 
unit 106 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing 
the reliability of the operation. Also, the 

5 instruction fetch unit 53 including the first 

conversion unit 121 and the instruction issue unit 
79 including the second conversion unit 122 
facilitate the issuance of basic instructions from 
the instruction issue unit 79 to the instruction 

0 execution units. 

Additionally, the circuit size of the 
parallel processor 27 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 

5 the instruction fetch unit 53, as in the foregoing 
examples . 

[ Third Embodiment ] 

shown in FIGS. 22 to 27, parallel 
0 processors 28^^ 33 in accordance with a third 

embodiment of the'^^^esent invention each comprises 
an instruction fetcnN^it 54-59 connected to the 
memory 12, an instructibii issue unit 80-85 connected 
to the instruction fetch iir^a^ 54-59, instruction 
5 execution units LUO , lUO, lUl^NPUO, FUl, MUO , MUl , 
and BUO , and a register unit lOO^sonnected to all 
the instruction execution units. Hebe, the 



nv^ruction execution units MUO and MUl are special- 
purpos^^-^^rithmetic instruction execution units that 
execute speca:'^i;:;Purpose arithmetic instructions. 
When the executioriB^^-^N.^^cial- purpose arithmetic 
instructions is completedV'^^^^ie instruction execution 
units MUO and MUl notify the irS'&^ruction issue unit 
80-85 of the complete of the execut^^ja. 

laithe following, the parallel processors 
in accordance^w^btljthe third embodiment of the 
present invention wirta>.^,be described by way of a case 
where the maximum basic infe<^ction word length 
contained in one instruction wo5ia%^is 2. It should 
be understood that the same ef f ects c&iK^e obtained 
in a case where the maximum instruction wol^dv^ength 
contained in one instruction word is 3 more gre^^fe^r. 
(Example 1) 

FIG. 22 shows the structure of a first 
example of the parallel processor in accordance with 
the third embodiment of the present invention. As 
shown in FIG. 22, the parallel processor 28 
comprises a conversion unit 123 in the instruction 
fetch unit 54. The structure and the operation of 
the conversion unit 123 are the same as the 
conversion unit 115 of Example 1 of the second 
embodiment. More specifically, the conversion unit 
123 rearranges basic instructions contained in each 
instruction word in accordance with the arrangement 
of the instruction execution units, and then 
supplies the rearranged basic instructions to the 
instruction issue unit 80. 

The parallel processor 28 having the above 
structure can achieve the same effects as the 
parallel processor 22 of Example 1 of the second 
embodiment. In other words, the issuance of basic 
instructions from the instruction issue unit 80 to 
the instruction execution units can be facilitated, 
and the operation speed can be increased. 



-36- 



Additionally , the circuit size of the 
parallel processor 28 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
5 the instruction execution units, as in the foregoing 
examples . 
(Example 2) 

FIG. 23 shows the structure of a second 
example of the parallel processor in accordance with 
10 the third embodiment of the present invention. As 
shown in FIG. 23, the parallel processor 29 has the 
same structure as the parallel processor 23 shown in 
D FIG. 12, comprising a conversion unit 124 in the 

instruction issue unit 81- The structure and 
15 operation of the conversion unit 124 are the same as 
the structure and operation of the conversion unit 
115 shown in FIGS. 10 and 11. 



Lfl 
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SJ In the parallel processor 29 of this 
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example, the instruction issue unit 81 issues each 
20 basic instruction to the corresponding one of the 
instruction execution units, only after the 
conversion unit 124 rearranges the basic 
instructions, which are contained in each 
instruction word supplied from the instruction fetch 
25 unit 55, in accordance with the arrangement of the 
instruction execution units. Thus, wires can be 
shortened as a whole, and the operation speed can be 
increased. 

Additionally, the circuit size of the 
30 parallel processor 29 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 55, as in the foregoing 
examples . 
3 5 (Example 3) 

FIG. 24 shows the structure of a third 
example of the parallel processor in accordance with 
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the third embodiment of the present invention. As 
shown in FIG. 24, the parallel processor 30 has 
substantially the same structure as the parallel 
processor 24 shown in FIG. 14. The instruction 
5 fetch unit 56 includes a first conversion unit 125 
that rearranges basic instructions contained in each 
fetched instruction word in accordance with the 
arrangement of the instruction execution units. The 
instruction issue unit 82 includes a second 
10 conversion unit 126 that further rearranges basic 
instructions contained in each instruction word 
supplied from the instruction fetch unit 56 in 
Q accordance with the arrangement of the instruction 

execution units . 

oi 

Iff 15 The first conversion unit 125 performs 

^ "preprocessing" of rearrangement of basic 

2! instructions, and the second conversion unit 126 

SJ performs "postprocessing" of basic instructions. 

l_ In an actual circuit, the processes in the 

.^^=1 20 instruction fetch unit 56 and the instruction issue 

unit 82 are pipelined in order to improve the 
performance of the parallel processor. Because of 
that, the difference in processing time between 
instruction fetch unit 56 and the instruction issue 
25 unit 82 should be as small as possible to optimize 
the pipeline effects. Therefore, the arrangement 
process is divided into the "preprocessing" and 
"postprocessing", so that the difference in 
processing time between the instruction fetch unit 
30 56 and the instruction issue unit 82 is small. 

By the parallel processor of this example 
having the above structure, wires can be shortened 
as a whole, and the operation speed can be increased 
Additionally, the circuit size of the 
35 parallel processor 30 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
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the instruction fetch unit 56, as in the foregoing 
examples . 
(Example 4) 

FIG. 25 shows the structure of a fourth 
embodiment of the parallel processor in accordance 
with the third embodiment of the present invention. 
As shown in FIG. 25, the parallel processor 31 has 
the same structure as the parallel processor 25 
shown in FIG. 16. The instruction fetch unit 57 
includes a conversion unit 127, and the instruction 
issue unit 83 includes a judgment unit 107. 

The structure and operation of the 
conversion unit 127 are the same as the structure 
and operation of the conversion unit 115 shown in 
FIGS. 10 and 11. The structure and operation of the 
judgment unit 107 are the same as the structure and 
operation of the judgment unit 103 shown in FIG. 6. 

By the parallel processor of this example 
having the above structure, the same effects as the 
parallel processor of Example 4 of the second 
embodiment can be obtained. More specifically, the 
instruction issue unit 83 including the judgment 
unit 107 enables accurate and efficient parallel 
processing of basic instructions, thereby increasing 
the reliability of the operation. The instruction 
fetch unit 57 including the conversion unit 127 
facilitates the issuance of basic instructions to 
the instruction execution units, thereby increasing 
the operation speed. 

Additionally, the circuit size of the 
parallel processor 31 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 57, as in the foregoing 
examples . 
(Example 5) 

FIG. 26 shows the structure of a fifth 
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example of the parallel processor in accordance with 
the third embodiment of the present invention. As 
shown in FIG. 26, the parallel processor 32 has the 
same structure as the parallel processor 26 of 
5 Example 5 of the second embodiment shown in FIG. 18. 
The instruction issue unit 84 includes a conversion 
unit 128 and a judgment unit 108. 

The structure and operation of the 
conversion unit 128 are the same as the structure 
10 and operation of the conversion unit 115 shown in 

FIGS. 10 and 11. The structure and operation of the 
judgement unit 108 are the same as the structure and 
O operation of the judgment unit 103. 

£ By the parallel processor of this example 

In 15 having the above structure, the same effects as the 

=P parallel processor 2 6 of Example 5 of the second 

f n 

^: embodiment. More specifically, the instruction 

SJ issue unit 84 including the judgment unit 108 

enables accurate and efficient parallel processing 
20 of basic instructions, thereby increasing the 
O reliability of the operation. The instruction issue 

2 unit 84 further including the conversion unit 128 

g facilitates the issuance of basic instructions to 

the instruction execution units, thereby increasing 
2 5 the operation speed . 

Additionally, the circuit size of the 
parallel processor 32 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
30 the instruction fetch unit 58, as in the foregoing 
examples • 
(Example 6) 

FIG. 27 shows the structure of a sixth 
example of the parallel processor in accordance with 
35 the third embodiment of the present invention. As 
shown in FIG. 27, the parallel processor 33 has the 
same structure as the parallel processor 27 as shown 
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in FIG, 20. 

The structures and operations of a first 
conversion unit 129 and a second conversion unit 130 
are the same as the structures and operations of the 
5 first conversion unit 117 and the second conversion 
unit 118 shown in FIG. 14. The structure and 
operation of a judgment unit 109 are the same as the 
structure and operation of the judgment unit 103. 

By the parallel processor of this example 
10 having the above structure, the same effects as 

obtained by the parallel processor 27 of Example 6 
of the second embodiment can be obtained. More 
Q specifically, the instruction issue unit 85 

including the judgment unit 109 enables accurate and 

oi 

15 efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
The instruction fetch unit 59 including the first 
conversion unit 129 and the instruction issue unit 
85 including the second conversion unit 130 
20 facilitate the issuance of basic instructions from 
the instruction issue unit 85 to the instruction 
execution units, thereby increasing the operation 
speed . 

Additionally, the circuit size of the 
25 parallel processor 33 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 59, as in the foregoing 
examples . 

30 

[ Fourth Embodiment ] 

As shown in FIGS. 28 to 33, a parallel 
processor 34-39 in accordance with a fourth 
embodiment of the present invention each comprises 
35 an instruction fetch unit 60-65 connected to the 

memory 12, an instruction issue unit 86-91 connected 
to the instruction fetch unit 60-65, instruction 
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execution units LUO , LUl, lUO , lUl, FUO , FUl , BUO , 
and BUI connected to the instruction issue unit 86- 
91, and a register unit 101 connected to all the 
instruction execution units. In this embodiment, 
5 the instruction execution unit LUl is a load store 
instruction execution unit that executes load 
instructions and store instructions. The 
instruction execution unit BUI is a branch 
instruction execution unit that executes branch 
10 instructions. When the execution is completed, the 
instruction execution unit BUI notifies the 
instruction issue unit 86-91 of the end of the 
O execution. 

In the following, the parallel processor 

yl 

iri 15 in accordance with the fourth embodiment of the 

present invention will be described by way of 

LP 

examples in which the maximum basic instruction word 
S| length contained in each one basic instruction is 4. 

In FIGS. 28 to 33, the maximum basic instruction 
20 word length being 4 is indicated by four arrows from 
O the instruction fetch unit 60-65 to the instruction 

^ issue unit 86-91. However, it should be understood 

S that the maximum basic instruction word length in 

the fourth embodiment is not limited to 4. 
2 5 (Example 1) 

FIG. 28 shows the structure of a first 
example of the parallel processor in accordance with 
the fourth embodiment of the present invention. As 
shown in FIG. 28, the parallel processor 34 
30 comprises a conversion unit 131 in the instruction 
fetch unit 60. The structure and operation of the 
conversion unit 131 are the same as the structure 
and operation of the conversion unit 115 of Example 
1 of the second embodiment. More specifically, the 
35 conversion unit 131 rearranges basic instructions 
contained in each fetched instruction word, in 
accordance with the arrangement of the instruction 



-42- 



execution units, and supplies the rearranged basic 
instructions to the instruction issue unit 86. 

By the parallel processor 34 having the 
above structure, the same effects as obtained by the 
5 parallel processor 22 of Example 1 of the second 

embodiment can also be obtained. More specifically, 
the issuance of basic instructions from the 
instruction issue unit 86 to the instruction 
execution units can be facilitated, and the 
10 operation speed can be increased accordingly. 

Additionally, the circuit size of the 
parallel processor 34 may be reduced by restricting 
Q in advance the arrangement of basic instructions 

2 contained in each instruction word to be supplied to 

Tn 15 the instruction fetch unit 60, as in the foregoing 

£ embodiments. 

(Example 2) 

FIG. 29 shows the structure of a second 
example of the parallel processor in accordance with 
20 the fourth embodiment of the present invention. As 
shown in FIG. 29, the parallel processor 35 has the 
same structure as the parallel processor 23 shown in 
FIG. 12, in that the instruction issue unit 87 
includes a conversion unit 132. The structure and 
25 operation of the conversion unit 132 are the same as 
the structure and operation of the conversion unit 
115 shown in FIGS. 10 and 11. 

In the parallel processor 35 of this 
example, the instruction issue unit 87 rearranges 
30 basic instructions contained in each instruction 

word supplied to the instruction fetch unit 61, in 
accordance with the arrangement of the instruction 
execution unit, and then supplies the rearranged 
basic instructions to the instruction execution 
35 units. Thus, wires can be shortened as a whole, and 
the operation speed can be increased. 

Additionally, the circuit size of the 
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parallel processor 35 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 61, as in the foregoing 
5 examples . 

( Example 3 ) 

FIG. 30 shows the structure of a third 
example of the parallel processor in accordance with 
the fourth embodiment of the present invention. As 
10 shown in FIG. 30, the parallel processor 36 has the 
same structure as the parallel processor 24 shown in 
FIG. 14. The instruction fetch unit 62 of this 
O parallel processor 36 includes a first conversion 

unit 133 that rearranges basic instructions 

y - 

if\ 15 contained in each fetched instruction word, in 

=t accordance with the arrangement of the instruction 

Ui 

r== execution units. The instruction issue unit 88 of 

E y 

the parallel processor 36 includes a second 
1, conversion unit 134 that further rearranges the 

20 basic instructions contained in each instruction 
O word supplied from the instruction fetch unit 62, in 

^ accordance with the arrangement of the instruction 

O execution units. 

The first conversion unit 133 performs 
25 "preprocessing" of the rearrangement of basic 

instructions, and the second conversion unit 134 
performs "postprocessing" of the rearrangement of 
the basic instructions. 

To improve the performance of the parallel 
30 processor in an actual circuit, the processes in the 
instruction fetch unit 62 and the instruction issue 
unit 88 are pipelined. Because of that, the 
difference in processing time between instruction 
fetch unit 62 and the instruction issue unit 88 
35 should be as small as possible to optimize the 
pipeline effects. Therefore, the arrangement 
process is divided into the "preprocessing" and 
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"postprocessing", so that the difference in 
processing time between the instruction fetch unit 
62 and the instruction issue unit 88 is small. 

By the parallel processor 36 of this 
5 example having the above structure, wires can be 

shortened as a whole, and the operation speed can be 
increased . 

Additionally, the circuit size of the 
parallel processor 36 may be reduced by restricting 
10 in advance the arrangement of basic instructions 

contained in each instruction word to be supplied to 
the instruction fetch unit 62, as in the foregoing 
examples . 
(Example 4) 

15 FIG. 31 shows the structure of a fourth 

example of the parallel processor in accordance with 
pi the fourth embodiment of the present invention. As 

S| shown in FIG. 31, the parallel processor 37 has the 

same structure as the parallel processor 25 shown in 
.J 20 FIG. 16, in that the instruction fetch unit 63 

Q includes a conversion unit 135 and the instruction 

^ issue unit 89 includes a judgment unit 110. 

O The structure and operation of the 

conversion unit 135 are the same as the structure 
25 and operation of the conversion unit 115 shown in 
FIGS. 10 and 11, On the other hand, the structure 
and operation of the judgment unit 110 are the same 
as the judgment unit 103 shown in FIG. 6. 

By the parallel processor 37 of this 
30 example having the above structure, the same effects 
as obtained by the parallel processor 2 5 of Example 
4 of the second embodiment can be obtained. More 
specifically, the instruction issue unit 89 
including the judgment unit 110 enables accurate and 
35 efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
The instruction fetch unit 63 including the 
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conversion unit 135 facilitates the issuance of 
basic instructions from the instruction issue unit 
89 to the instruction execution units, thereby 
increasing the operation speed. 
5 Additionally, the circuit size of the 

parallel processor 37 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 63, as in the foregoing 
10 examples. 

(Example 5) 

FIG. 32 shows the structure of a fifth 
O example of the parallel processor in accordance with 

5 the fourth embodiment of the present invention. As 

01 

Ul 15 shown in FIG. 32, the parallel processor 38 has the 

^ same structure as the parallel processor 26 of 

in 

Example 5 of the second embodiment shown in FIG. 18, 
SJ in that the instruction issue unit 90 includes a 

conversion unit 136 and a judgment unit 111. 
2 20 The structure and operation of the 

O conversion unit 136 are the same as the structure 

and operation of the conversion unit 115 shown in 
FIGS. 10 and 11. On the other hand, the structure 
and operation of the judgment unit 111 are the same 
25 as the structure and operation of the judgment unit 
103 shown in FIG. 6. 

By the parallel processor of this example 
having the above structure, the same effects as 
obtained by the parallel processor 26 of Example 5 
30 of the second embodiment can be obtained. More 
specifically, the instruction issue unit 90 
including the judgment unit 111 enables accurate and 
efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
35 The instruction issue unit 90 further including the 
conversion unit 136 facilitates the issuance of 
basic instruction to the instruction execution units. 
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thereby increasing the operation speed. 

Additionally, the circuit size of the 
parallel processor 38 may be reduced by restricting 
in advance the arrangement of basic instructions 
5 contained in each instruction word to be supplied to 
the instruction fetch unit 64, as in the foregoing 
examples . 
(Example 6) 

FIG. 33 shows the structure of a sixth 
10 example of the parallel processor in accordance with 
the fourth embodiment of the present invention. As 
shown in FIG. 33, the parallel processor 39 has the 
□ same structure as the parallel processor 27 shown in 

£ FIG. 20. 

iP 15 The structures and operations of a first 

£ conversion unit 137 and a second conversion unit 138 

m 

R E 



are the same as the structures and operations of the 
first conversion unit 117 and the second conversion 
unit 118 shown in FIG. 14. On the other hand, the 
20 structure and operation of the judgment unit 112 are 
Q the same as the structure and operation of the 

C judgment unit 103 shown in FIG. 6. 

2 By the parallel processor 39 of this 

example having the above structure, the same effects 
25 as obtained by the parallel processor of Example 6 
of the second embodiment can be obtained. More 
specifically, the instruction issue unit 91 
including the judgment unit 112 enables accurate and 
efficient parallel processing of basic instructions, 
30 thereby increasing the reliability of the operation. 
The instruction fetch unit 65 including the first 
conversion unit 137 and the instruction issue unit 
91 further including the second conversion unit 138 
facilitate the issuance of basic instructions from 
35 the instruction issue unit 91 to the instruction 
execution units , thereby increasing the operation 
speed . 



Additionally, the circuit size of the 
parallel processor 39 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 65, as in the foregoing 
examples . 

[Fifth Embodiment] 

As shown in FIGS. 34 to 39, parallel 
processors 40 to 45 in accordance with a fifth 
embodiment of the present invention each comprise an 
instruction fetch unit 66-71 connected to the memory 
12, an instruction issue unit 92-97 connected to the 
instruction fetch unit 66-71, instruction execution 
units LUO, LUl, lUO , lUl , FUO , FUl , MUO , MUl , BUO , 
and BUI, and a register unit 102 connected to all 
the instruction execution units. 

In the following, the parallel processor 
in accordance with the fifth embodiment of the 
present invention will be described by way of 
examples in which the maximum basic instruction word 
length contained in each instruction word is 4 . In 
FIGS. 34 to 39, the maximum basic instruction word 
length being 4 is indicated by four arrows extending 
from the instruction issue unit 66-71 to the 
instruction issue unit 92-97. 

It should be understood that the maximum 
basic instruction word length is not limited to 4 in 
this embodiment . 
(Example 1) 

FIG. 34 shows the structure of a first 
example of the parallel processor in accordance with 
the fifth embodiment of the present invention. As 
shown in FIG. 34, the parallel processor 40 
comprises a conversion unit 139 in the instruction 
fetch unit 66. The structure and operation of the 
conversion unit 139 are the same as the structure 
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and operation of the conversion unit 115 of Example 
1 of the second embodiment of the present invention. 
The conversion unit 139 rearranges basic 
instructions contained in each fetched instruction 
word, in accordance with the arrangement of the 
instruction execution units, and then supplies the 
rearranged basic instructions to the instruction 
issue unit 92 . 

By the parallel processor 40 having the 
above structure, the same effects as obtained by the 
parallel processor 22 of Example 1 of the second 
embodiment can be obtained. More specifically, the 
issuance of basic instruction from the instruction 
issue unit 92 to the instruction execution units can 
be facilitated, and the operation speed can be 
increased accordingly . 

Additionally, the circuit size of the 
parallel processor 40 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 66, as in the foregoing 
embodiments . 
(Example 2) 

FIG. 35 shows the structure of a second 
example of the parallel processor in accordance with 
the fifth embodiment of the present invention- As 
shown in FIG. 35, the parallel processor 41 has the 
same structure as the parallel processor 23 shown in 
FIG. 12, in that the instruction issue unit 93 
includes a conversion unit 140. The structure and 
operation of the conversion unit 140 are the same as 
the structure and operation of the conversion unit 
115 shown in FIGS. 10 and 11. 

In the parallel processor 41 of this 
example, the instruction issue unit 93 rearranges 
basic instructions contained in each instruction 
word supplied from the instruction fetch unit 67, 
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and then supplies the rearranged basic instructions 
to the instruction execution units. Thus, wires can 
be shortened as a whole, and the operation speed can 
be increased . 
5 Additionally, the circuit size of the 

parallel processor 41 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 67, as in the foregoing 
10 examples. 

( Example 3 ) 

FIG. 36 shows the structure of a third 

0 example of the parallel processor in accordance with 

— the fifth embodiment of the present invention. As 

y 3 

15 shown in FIG. 36, the parallel processor 42 of this 

:=P example has the same structure as the parallel 

1 n 

^! processor 24 shown in FIG. 14. The instruction 

S| fetch unit 68 of the parallel processor 42 includes 

L. a first conversion unit 141 that rearranges basic 

20 instructions contained in each fetched instruction 
O word in accordance with the arrangement of the 

™ instruction execution units. The instruction issue 

p unit 94 of the parallel processor 42 includes a 

second conversion unit 142 that further rearranges 
25 basic instructions contained in each instruction 

word supplied from the instruction fetch unit 68 in 
accordance with the arrangement of the instruction 
execution units . 

The first conversion unit 141 performs 
30 "preprocessing" of the rearrangement of basic 

instructions, and the second conversion unit 142 
performs "postprocessing" of the rearrangement of 
the basic instructions. 

In order to improve the performance of the 
35 parallel processor in an actual circuit, the 

processes in the instruction fetch unit 68 and the 
instruction issue unit 94 are pipelined. Because of 
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that , the difference in processing time between 
instruction fetch unit 68 and the instruction issue 
unit 94 should be as small as possible to optimize 
the pipeline effects. Therefore, the arrangement 
5 process is divided into the "preprocessing" and 
"postprocessing", so that the difference in 
processing time between the instruction fetch unit 
68 and the instruction issue unit 94 can be small. 

By the parallel processor 42 of this 
10 example having the above structure, wires can be 

shortened as a whole, and the operation speed can be 
increased. 

Additionally, the circuit size of the 
parallel processor 42 may be reduced by restricting 
15 in advance the arrangement of basic instructions 

contained in each instruction word to be supplied to 
the instruction fetch unit 68, as in the foregoing 
examples . 
(Example 4) 

20 FIG. 37 shows the structure of a fourth 

example of the parallel processor in accordance with 
the fifth embodiment of the present invention. As 
shown in FIG. 37, the parallel processor 43 has the 
same structure as the parallel processor 25 shown in 

25 FIG. 16, in that the instruction fetch unit 69 

includes a conversion unit 143 and the instruction 
issue unit 95 includes a judgment unit 113. 

The structure and operation of the 
conversion unit 143 are the same as the structure 

30 and operation of the conversion unit 115 shown in 
FIGS. 10 and 11. On the other hand, the structure 
and operation of the judgment unit 113 are the same 
as the structure and operation of the judgment unit 
103 shown in FIG. 6. 

35 By the parallel processor 43 of this 

example having the above structure, the same effects 
as obtained by the parallel processor 25 of Example 
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4 of the second embodiment can be obtained. More 
specifically, the instruction issue unit 95 
including the judgment unit 113 enables accurate and 
efficient parallel processing of basic instructions, 

5 thereby increasing the reliability of the operation. 
The instruction fetch unit 69 including the 
conversion unit 143 facilitates the issuance of 
basic instructions from the instruction issue unit 
95 to the instruction execution units, thereby 

10 increasing the operation speed. 

Additionally, the circuit size of the 
parallel processor 43 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 

15 the instruction fetch unit 69, as in the foregoing 
examples . 
(Example 5) 

FIG. 38 shows the structure of a fifth 
example of the parallel processor in accordance with 

20 the fifth embodiment of the present invention. As 
shown in FIG. 38, the parallel processor 44 of this 
example has the same structure as the parallel 
processor 26 of Example 5 of the second embodiment 
shown in FIG. 18, in that the instruction issue unit 

25 96 includes a conversion unit 144 and a judgment 
unit 114 . 

The structure and operation of the 
conversion unit 144 are the same as the structure 
and operation of the conversion unit 115 shown in 

30 FIGS. 10 and 11, On the other hand, the structure 
and operation of the judgment unit 114 are the same 
as the structure and operation of the judgment unit 
103 shown in FIG. 6. 

By the parallel processor 44 of this 

35 example having the above structure, the same effects 
as obtained by the parallel processor 26 of Example 

5 of the second embodiment can be obtained. More 
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specifically, the instruction issue unit 96 
including the judgment unit 114 enables accurate and 
efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
5 The instruction issue unit 96 further including the 
conversion unit 144 facilitates the issuance of 
basic instructions to the instruction execution 
units, thereby increasing the operation speed. 

Additionally, the circuit size of the 
10 parallel processor 44 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 70, as in the foregoing 
examples . 



yi 15 (Example 6) 



FIG. 39 shows the structure of a sixth 
example of the parallel processor in accordance with 
S| the fifth embodiment of the present invention. As 

shown in FIG. 39, the parallel processor 45 of this 
- 20 example has the same structure as the parallel 

O processor 27 shown in FIG. 20. The instruction 

^ fetch unit 71 includes a first conversion unit 145, 

R and the instruction issue unit 97 includes a second 

conversion unit 146 and a judgment unit 219. 
25 The structures and operations of the first 

conversion unit 145 and the second conversion unit 
146 are the same as the structures and operations of 
the first conversion unit 117 and the second 
conversion unit 118 shown in FIG. 14. On the other 
30 hand, the structure and operation of the judgment 

unit 219 are the same as the structure and operation 
of the judgment unit 103 shown in FIG. 6. 

By the parallel processor 45 of this 
example having the above structure, the same effects 
35 as obtained by the parallel processor 27 of Example 
6 of the second embodiment can be obtained. More 
specifically, the instruction issue unit 97 



-53- 



including the judgment unit 219 enables accurate and 
efficient parallel processing of basic instructions, 
thereby increasing the reliability of the operation. 
The instruction fetch unit 71 including the first 
conversion unit 145 and the instruction issue unit 
97 including the second conversion unit 146 
facilitate the issuance of basic instructions from 
the instruction issue unit 97 to the instruction 
execution units, thereby increasing the operation 
speed. 

Additionally, the circuit size of the 
parallel processor 45 may be reduced by restricting 
in advance the arrangement of basic instructions 
contained in each instruction word to be supplied to 
the instruction fetch unit 71, as in the foregoing 
examples . 

The present invention is not limited to 
the specifically disclosed embodiments, but 
variations and modifications may be made without 
departing from the scope of the present invention. 

The present application is based on 
Japanese priority application No. 11-281957, filed 
on October 1, 1999, the entire contents of which are 
hereby incorporated by reference. 



